INVERTIBILITY OF RANDOM SUBMATRICES VIA 
TAIL-DECOUPLING AND A MATRIX CHERNOFF 

INEQUALITY 



STEPHANE CHRETIEN AND SEBASTIEN DARSES 



Abstract. Let X he a. n x p real matrix with coherence i-i{X) = 
maxj^ji \XjXji\. We present a simpUfied and improved study of the 
quasi-isometry property for most submatrices of X obtained by uniform 
column sampling. Our results depend on n{X), the operator norm ||X|| 
and the dimensions with explicit constants, which improve the previ- 
ously known values by a large factor. The analysis relies on a tail- 
decoupling argument, of independent interest, and a recent version of 
the Non-Commutative Chernoff inequality (NCCI). 

1. Introduction 

1.1. Problem statement. Let M"^^' denote the set of all nxp real matrices. 
For any M £ M"^?', we denote by M* its transpose and by || • || its operator 
norm: 

||M|| := max ||Mj;||2, \\x\\2 = x^x. 

xGRp ,\\x\\2 = l 

Let X G IR"'^^ and T be a random index subset of size s of {1, ■ ■ ■ ,p} drawn 
from the uniform distribution. Let Xt denote the submatrix obtained by 
extracting the columns Xj^s of X indexed by j G T. We say that Xt is 
an ro-quasi-isometry if HX^Xy — Id|| < tq (quasi-isometry property). The 
goal of this paper is to propose a new upper bound for the probability that 
the submatrix Xt fails to be an ro-quasi-isometry. In the sequel, we assume 
that the columns of X have unit norm. 

Proving that the quasi-isometry property holds with high probability has 
applications in Compressed Sensing and high-dimensional statistics based 
on sparsity. The uniform version of the quasi-isometry property, i.e., sat- 
isfied for all possible T's, is called the Restricted Isometry Property (RIP) 
and has been widely studied for independent, identically distributed (i.i.d.) 
sub-Gaussian matrices [7]. Recent works such as j2| proved that the quasi- 
isometry property holds with high probability for matrices with sufficiently 
small coherence fi{X) := maXjyj' Unlike checking the RIP, com- 

puting /u(X) can be achieved in polynomial time. Such types of result are 
therefore of great potential interest for a wide class of problems involving 
high-dimensional linear or nonlinear regression models. 

Let {Sj} denote a sequence of i.i.d. Bernoulli 0-1 random variables with 
expectation 6. Let R denote the square diagonal "selector matrix" whose j^^ 
diagonal entry is 6j. Following the landmark papers of Bourgain and Tzafriri 
[1] (see also [3|) and Rudelson [8j, Tropp |10) established, in particular, a 
bound for {E\\R{X^X - ld)R\\py/P, p G [2,oo). As in 0, the proof heavily 
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relies on the Non-Commutative Khintchine inequality. Using Tropp's result, 
Candes and Plan proved in [21 Theorem 3.2] that Xt is a 1/2-quasi-isometry 
with probability greater than 1 — p~2'°s(2) when s < p/(4||X|p) and the 
coherence fJ,{X) is sufficiently small. The quasi-isometry property for ro = | 
then holds with high probability under easily-checked assumptions on X. 

1.2. Our contribution. The present paper aims at giving a more precise 
and self-contained version of Theorem 3.2 in [2]. Our result yields explicit 
constants, which improve the previously known values by a large factor. The 
analysis relies on a tail-decoupling argument, of independent interest, and a 
recent version of a Non-Commutative Chernoff inequality (NCCI) [llj. 

1.3. Additional notations. For S C {1, • • • ,p}, we denote by l^l the car- 
dinality of S. Given a vector x G M^, we set xt = {xj)j^T S M''^L 

We denote by ||M||i^2 the maximum /2-iiorm of a column of M G W^'^p 
and ||M|| 

max is the maximum absolute entry of Ad . 
In the present paper, we consider the 'hollow Gram' matrix H: 

(1.1) H = X^X-ld. 

In the sequel, R' will always denote an independent copy of the selector 
matrix R. Let Rg be a diagonal matrix whose diagonal is a random vector 5^^^ 
of length p, uniformly distributed on the set of all vectors with s components 
equal to 1 and p — s components equal to 0. Notice that when 6 = s/p, the 
support of the diagonal of R has cardinality close to s with high probability, 
by a standard concentration argument. 

2. Preliminary results 

2.1. On Rademacher chaos of order 2. Let {77,} be a sequence of i.i.d. 
Rademacher random variables. Theorem 3.2.2 in |6l p. 113] gives the follow- 
ing general result: a Banach- valued homogeneous chaos X of order d 

l<n<-<jd<p 
1 / , \ d/2 1 

verifies (EHXf)? < (^j (E||X||p)p, 1< p < g < 00. 

We give an elementary proof in the real case with d = 2 and q = 2p = 4, 
which yields a better constant. 

Lemma 2.1. Let Xij G M, 1 < i,j < p. The homogeneous Rademacher 
chaos of order 2: ^ = X]j<j ^ijViVj verifies 

(2.2) E < 9 (E ^2)2 

Proof. The multinomial formula applied to ^ raised to the positive power q, 
gives 

(2-3) = Ej^IKn^^^^r-, 

where the sum is over all integers a^j's, i < j, such that ^ aij = q, and the 
products are over all the indices (i, j), i < j, ordered via the lexicographical 
order, still denoted by '<'. As from now, let these conventions hold. 
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Case q = 2 — The partitions of 2 are 2 + O's and 1 + 1 + O's. Consider 
the partition 1 + 1 + O's, say a^^i = a^'i' = 1 for some 4-uple {k, I, k', I') with 
k < k'. We have {k,l) ^ {k',l'), k < I a,nd k' < I'. Thus, 



Therefore, E only depends on the partition 2 + O's, and one has 



Case q = 4 — The partitions of 4 are 4, 2 + 2, 3 + 1, 2 + 1 + 1 and 
1 + 1 + 1 + 1 (we now omit the zeros). 

First, using the same arguments as in the case q = 2, we show that the 
terms in E corresponding to the partitions 3 + 1 and 2 + 1 + 1 vanish. 

Second, the partitions 1 + 1 + 1 + 1 involve four different couples (i, ^'), 
(iii')' {k,k') and (1,1') (recall that i < i' , etc., and that the couples are 
lexicographically ordered). The only terms corresponding to the partitions 
1 + 1 + 1 + 1 whose expectation does not vanish are of the form 



i.e., the four couples (ii,i'i) < («i,^2) < (^25^1) < (^25^2) vertices of a 

rectangle into the upper off diagonal part of the matrix (xij). We denote by 
TZ the set of all these rectangles whose vertices are lexicographically ordered. 



Figure 1. The matrix (xij) where a 'rectangle' of TZ is drawn. 

Finally, the ajj's corresponding to the partitions 4 and 2 + 2 are even: 
aij = 2l3ij, with J2Pij = 2. Therefore 




(2.4) 




i<j 



2 2 2 2 




E 



4! 



n4'' ^n^i^,''iii^i2i^x^^ 



A + B. 



But 




HA,! 



2! 



11(4)"'^ = 3 
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and 



H i<i',j<j' \i<i' / 

{i,i')<Ud') 

The second inequality for B stems from relaxing the constraints induced by 
TZ and illustrated in Fig. 12.11 Using (j2.4p . we obtain the desired result. □ 

Remark 2.2. The ratio E i^^/(]E ^^)^ 6e used in the proof of Prop. \4-l\ 
We gain a factor 9 compared to the constant (|5t)^^ = 81. 

2.2. A Non- Commutative Chernoff inequality. We will also need a 
corollary of a Matrix Chernoff 's inequality recently established in |11) . 

Theorem 2.3. (Matrix Chernoff Inequality [11]) Let Xi,. . . ,Xp he in- 
dependent random positive semi-definite matrices taking values in W^^*^ . Set 
Sp = X^j=i Xj. Assume that for all j £ {1, • • • ,p}, \\Xj\\ < B a.s. and 

Then, for all r > e /imaxj 

f{\\Sp\\>r) < di^^-^y"" . 

(Set r = {l + 5)fi 

max and use < e^^*^ in Theorem 1.1 |11) .) 

3. Main results 

3.1. Singular- value concentration theorem. 

Theorem 3.1. Let r £ (0,1), a > 1. Let us be given a full-rank matrix 
X € M"^P and a positive integer s, such that 

(3.5) u(X) < — ^— 

^ ^ ' - 2(l + a)logp 

2 

r p 
^ - 4(l + a)e2 \\X\\Hogp' 

Let T C {1, . . . ,p} be a set with cardinality s, chosen randomly from the 
uniform distribution. Then the following bound holds: 

(3.7) P(||X^XT-Id,||>r) < 

3.2. Remarks on the various constants. 

The constant 216 stems from the following decomposition: 2 (poissoniza- 
tion) x36 (decoupling) x3 (union bound). This constant might look large. 
However, in many statistical applications as in sparse models, p is often 
assumed to be very large. 

Let us now compare the constants Cg and in the inequalities 

(3.8) /i(X) < ^" 



(3.9) s < C, 



logp 

P 



|XP logp' 
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to the one of [2]. The larger Cg and are, the better the result is. 

One of the various constraints on the rate a in [2] is given by the theorem of 
Tropp in |10) . In this setting, a = 2 log 2 and vq = 1/2, the author's choice of 
1/2 being unessential. To obtain such a rate a, they need to impose the r.h.s. 
of (3.15) in [2\ to be less than 1/4, that is 30C^ + 13\/2CT < i. This yields 
Cs < 1.19 X 10"^. Choosing Cs close to 1.19 x 10"^, e.g. C, ~ 1.18 10"^, 
we obtain: 

Cs ^ 1.18 10"^, Cf, ~ 1.7 10"^ 

Our theorem allows to choose any rate a > 0. To make a fair comparison, 
let us choose a = 2 log 2 and r = 1/2. We obtain: 

Cs ^ 3.5 10"^ Ca ^ 0.1. 



4. Proof of Theorem 13.11 

In order to study the invertibility condition, we want to obtain bounds for 
the distribution tail of random sub-matrices oi H = X^X — Id. 

Let R' be an independent copy of R. Let us recall two basic estimates: 

As a preliminary, let us notice that 

(4.10) ¥{\\RsHRs\\>r) < 2¥{\\RHR\\>r), 

which can be actually proven using the same kind of 'Poissonization argu- 
ment' as in Claim (3.29) p. 2173 in [2j. 

To study the tail-distribution of we use a decoupling technique 

which consists of replacing with 

Proposition 4.1. The operator norm of RHR satisfies 

(4.11) ¥{\\RHR\\>r) < 36 F (||i?Fi?'|| > r/2) . 

The main feature of this inequality is that the numerical constants are 
improved by a great factor when compared to the general result [5l Theorem 
1 p. 224] (cf. Remark 15. ip . In addition to this decoupling argument, we need 
the following technical concentration result. 



Proposition 4.2. Let X G W^^^ he a full-rank matrix. For all parameters 
s,r,u,v such that > > and v'^ > the following bound 

holds: 

(4.12) F{\\RHR'\\>r) < 3 p V{s,[r,u,v]), 

with 



V{s,[r,u,v\) = e — j +U 2~ T 

\ p r'^ J \ p J \ p 

We now have to analyze carefully the various quantities in Proposition 14.2 
in order to obtain for P {\\RHR'\\ > r/2) a bound of the order e~"^°^^. 
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Set a' = a + 1 and r' = r/2. We tune the parameters so that 



U2 



(4.13) — = a'logp 



(4.14) = a'logp 

(4.15) > a'logp, 

V 

and 

(4.16) e-^ < 

(4.17) e-^ < e-' 

p 

(4.18) < 

A crucial quantity turns out to be Keeping in mind that the hy- 



pothesis on the coherence reads 
(4.19) ^i{X) < 



logp 

it is necessary to impose that s satisfies 

S II -,^112 _ 



(4.20) -||X„ - , 

p logp 

The constants and Cg will be tuned according to several constraints. The 
equalities (|4. 13114.1^ determine the values of u and v. It remains to show 
that the previous inequalities are satisfied for a suitable choice of and Cs- 
First, substituting (j4.13p into (|4.18p . we obtain: 

a'-||X|plogp < e"^r'^. 
p 



Using (jOO]) . it follows that 

(4.21) Cs < ,2 

Now, the bound (|4.16p is satisfied if 



e'Cs 



< a'logp. 



logp 

Based on (|4.2ip . it suffices to have < log^p, that is p > e > 
Second, substituting ()4.14p into (|4.17p . we obtain: 



e^-llXf < a'n(Xflogp. 
p 



Using (f¥J9]) and (020]), it fohows that 



V a 



Finally, (I4.14M.15I) yields r'^ > a'^fj,{X)^ log^p. In view of (gH]), it thus 



suffices to have r' > a' C^. 
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To reach the desired conclusion, in order to ensure the six previous con- 
straints, it suffices to choose Cs and such that: 

r' ( 
This completes the proof of Theorem 13.11 

5. Proof of the tail-decoupling and the concentration result 
5.1. Proof of Proposition 14. ll Let us write 

RHR = ^ ^ djdjHjj. 

Let {r]i} be a sequence of i.i.d. independent Rademacher random variables, 
mutually independent of P := {6i,l < i < p}. Following Bourgain and 
Tzafriri [1], and de la Pena and Gine [6j, we construct an auxiliary random 
variable: 

Z = Z{7],5) := ^(1 - r]ir]i)5i5jHij. 

Setting Y = X^j^j 5i5jHijr]ir]j, we can write 

(5.22) Z = RHR + Y. 

For the sake of completeness, we recall basic arguments from Corollary 
3.3.8 p. 12 in de la Peiia and Gine [6j (applied to (|5.22p ) to obtain a lower 
bound for P(||.Z'|| > (We henceforth work conditionally on P.) 

Hahn-Banach's theorem gives a linear form x* on M^^^ such that 

P(||Z|| > \\RHR\\ \V) > F{x*{Z) > x*{RHR) \V) 

(5.23) > F{x*{Y) > \V). 

For any centered real random variable ^, one obtains using Holder's inequal- 
ity twice (first with E|^| = 2E C^>o, second with E ^ E ^2/3^4/3). 

. , , , i(Ee^)2 

(5.24 P ^ > > '^'^ > - ^ ^ ; . 

Noticing that x*{Y) is a centered homogeneous real chaos of order 2, we 
deduce from (15.23^ . (|5.24p and Lemma |2. 11 

(5.25) F(\\Z\\> \\RHR\\\V) > ^— = — . 
^ ' VII II - ^ - 4 X 9 36 

Multiplying both sides by l{||/jiy/?||>r} S'^d taking the expectation, one has 

(5.26) —F{\\RHR\\>r) < P(||Z||>r). 
36 

As from now, we can use similar arguments to |10[ Prop. 2.1]. There is a 
r/* e {-1, 1}P for which 

P(||Z|| > r) = E E [l{||z||>r}l(r/0] < E M\\z{rM>r} = > 0- 

Hence, setting T = {i, rj* = 1}, we can write 
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Since H is hermitian, we have 



Now, let ((5') be an independent copy of {6i). Set 5i = 5, if i G T and 5i = 6'^ 
if i G T^. Since the vectors (5,) and have the same law, we then obtain: 



> r 



IZII > r) < P I 2 ^AHjk 

jeT, k&T'^ 

Re-introducing the missing entries in H yields 

F{\\Z\\>r) < F{\\RHR'\\>r/2), 
which concludes the proof of the lemma due to (|5.26p . 

Remark 5.1. The previous result can be seen as a special case of Theorem 
1 p. 224 of the seminal paper [5J. Tracing the various constants involved in 
this theorem, we obtained the inequality 



(5.27) 



IRHRW > r) < 10^ 



5.2. Proof of Proposition 1121 We first apply the NCCI to \\RHR'\\ by 
conditioning on R. 

Lemma 5.2. The following bound holds: 

P{\\RHR'\\>r) < F{\\RH\\>u) + F{\\RH\\i^2>v) 



(5.28) 



s 

+ p\e 2 



Proof. We have \\RHR'f = \\RHR''^HR\\. But i?'^ = R',so 

(5.29) P {WRHR'W >r)=P {\\RHR'HR\\ > r^) . 
We will first compute the conditional probability 

(5.30) F{\\RHR'HR\\>r^\R) := E[l{\\jiHR'HR\\>r^}\ 
Let Zj be the j*'^ column of RH, j G {1, • • • ,p}. Notice that 



RHR'HR = := 
Since Yl^=i ^j^j ~ RH^R and = \\Zj\\^ , we then obtain 



< \\RH 
s 



< -\\RH 
P 



(5.31) 
(5.32) 

The NCCI then yields 

(5.33) P {\\RHR'HR\\ > r^ \ R) < p{e- 



1-^2 
2 



p r^ 
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provided that 

s \\RHP 

5.34 e- " '' < 1. 

Let us now introduce the events 

A = {\\RHR'HR\\>r^}; B = {\\RH\\ > u} ; C = {\\RH\\i^2 > v} . 
We have 

P(^) = F{A\BuC)¥{BuC)+¥{AnB^nC^) 
< F{B) + ¥{C) +w{AnB^n C") . 

The identity P(^ni3^nC'=) = E [l^nB<=nc<=] = E [P | i?) l^cncc] concludes 
the lemma. □ 

We now have to control the norm of -RH^R, the norm of RH and the 
column norm of RH. Let us begin with ||i?-fr|| = 

Lemma 5.3. The following hounds hold: 



P{\\HR\\>u) < p(e-^ 
' p 

\RR\\x^2>v) < p(e-^ 



|4X "VII^IP 



11^ li ~ 

provided that e- " l' and e 

Proof. The steps are of course the same as what we have just done in the 
proof of Lemma 14.11 Notice that 

F{\\RH\\ >u)=F {\\HRf > n^) = P {\\HRH\\ > u^) . 
The j*^ column of H is Hj = X^Xj — ej. Moreover, 



^^Y- O'fe less than 1. 



(5.35) 



We have \\HjH!j\ 



HRH = Y,5,H,H]. 



\Hj\\l < \\H\\l^^ < and 



(5.36) 



< 1\\H\\^ < l\\x\\\ 



P 



We finally deduce from the NCCI that 



(5.37) 



\HRH\\>u'^) < pie 



s \\X\ 



p 



4\ "Vll^ll 



p 



Let us now control the supremum ^2-iiorm of the columns of RH. Set 

p 

(5.38) M = dmgiHkHi). 



k=l 



Notice that 

|2 



RH\\l^2 = maxf^, \\{RH)kM = ||diag {{RHYRH) 



|diag(//*i2i/)|| . 
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Thus, 

Using symmetry of H and interchanging the summation and the "diag" 
operation, we obtain that ||i?i?||f_^2 — II ^11- Moreover, we have for all 
A; G {!,••• 

(5.39) \\di^g{HkHi)\\ =ma4^i(X,Xfe)2 < ^^{xf, 

and 

||EM|| = -||diag(i?//*)||2 = -||i/||L2 < -ll^f- 
p p p 

Applying the NCCI completes the lemma. □ 

Ackowledgment. The authors thank the referee for valuable comments 
that improved the paper. They thank Max Hiigel for pointing out a mistake 
in a constant involved in a previous version of the tail-decoupling inequality. 
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